Introduction

At the time of writing the second season of The Witcher series is trending on Netflix. A fan favorite song of this series is Toss a coin to your Witcher. So, in honor to the Witcher, lets toss some coins.

Inspiration for a gambling game

I once read about a gambling game in a thriller book about a financial meltdown, Greed - by Marc Elsberg. It went like this (Spoiler alert): A guy offers the following game to 10 people in a bar. Each of the 10 people pay the guy 10€ and then they individually throw a fair coin 100 times. Each time when they throw a head, their paid money increases by a factor of 1.5 and each time when they throw a tail, their money decreases by a factor of 0.6. After 100 coin tosses they get payed back the amount they ended with.
Example: I pay 10€ and throw a head on my first turn - now I’m at 15€. In the next round I throw a head again - now I’m at 22.5€. In the third round I throw tail - now I’m at 13.5€. After 100 rounds I might be at 7€, so I get payed back 7€ and therefore lost 3€ (10€ payed, 7€ won).
The story continuous with one man convincing the other nine to play, because, he learned in school, that the average win in this game must be \(\frac12 1.5 + \frac 12 0.6 = 1.05\) and thus each game has an expected win-factor of 1.05. What follows is, that all 10 of them play and lose the game and the guy who just won almost 100€ lectures them on mathematics, why not to play a game like this and what this has to do with the crash of economies.

I was fascinated by this game and was wondering about what went wrong in the reasoning of the man. Therefore I simulated the game and played around with different numbers of players and factors. But I couldn’t replicate the conclusion of the guy who offered the game, because indeed, a lot of times, the game was a lose for me (in this scenario I’m the one offering the game). So I began to think, that it doesn’t make sense for me to offer this game. But at the same time, most or even all of the players lost the game, so I wouldn’t wanna play this game neither. It seemed to me, that I found a game where it is not recommendable to play the game, but also not recommendable to offer it.

In this small project I want to analyse more specifically why this is the case and use this opportunity to learn more about the geometric mean and log-normal-distributions.

Simulation

One Game

We start by writing a function that simulates the game for a single player. It doesn’t matter which result (head, tail) is a win or lose, but for the sake of writing we say that head is a win and tail is a lose.

One_Game function

Arguments:

  • Startvalue: A numeric (>0) that represents the stake of a game (amount of money the player pays to play).
  • Win: A numeric (> 1) that represents the factor with which the Startvalue is multiplied when throwing head. Default is set to 1.5.
  • Lose: A numeric (< 1)that represents the factor with which the Startvalue is multiplied when throwing tail. Default is set to 0.6.
  • rounds: A numeric whole number (\(\geq 1\)) that represents the number of played rounds, i.e. how often the coin is tossed. Default is set to 100.
  • head_prob: A numeric (0 < head_prob < 1) that represents the probability for heads. Default is set to 0.5 (fair coin).

Outcome: A vector of length rounds+1 that contains the stake of the player in all rounds. Starting with Startvalue in row 1 and ending with the amount of money he gets payed back.

One_Game <- function(Win = 1.5, Lose = 0.6, Startvalue = 10, rounds = 100,head_prob = 0.5){
    
  #Input Checks
  stopifnot(is.numeric(Win) & is.numeric(Lose) & is.numeric(Startvalue) & is.numeric(rounds) & is.numeric(head_prob) & Win > 1 & Lose < 1 & rounds%%1 == 0 & rounds >= 1 & head_prob < 1 & head_prob > 0)
  
  # Creating a vector of length rounds that contains the results for the coin tosses (0 for tail and 1 for head)
  Outcomes = sample(0:1, size=rounds, prob = c(1-head_prob,head_prob),replace = TRUE)
    
  # Creating a vector that contains the stake at each round based on the Outcomes above.
  Spielverlauf = Startvalue
  index=1
  for(toss in Outcomes){
    if(toss){
      Spielverlauf = c(Spielverlauf,Spielverlauf[index]*Win)
    } else{
      Spielverlauf = c(Spielverlauf,Spielverlauf[index]*Lose)
    }
    index = index + 1
  }
  
  return(Spielverlauf)
}

If the last value in the vector is bigger than Startvalue the player won money. Otherwise he lost.

Example with default values:

##   [1] 10.00000000  6.00000000  9.00000000 13.50000000  8.10000000  4.86000000
##   [7]  2.91600000  4.37400000  6.56100000  9.84150000  5.90490000  3.54294000
##  [13]  5.31441000  3.18864600  1.91318760  2.86978140  1.72186884  1.03312130
##  [19]  1.54968196  0.92980917  0.55788550  0.33473130  0.50209695  0.75314543
##  [25]  1.12971815  0.67783089  0.40669853  0.61004780  0.36602868  0.21961721
##  [31]  0.32942581  0.49413872  0.29648323  0.44472485  0.26683491  0.16010094
##  [37]  0.24015142  0.36022712  0.21613627  0.12968176  0.07780906  0.11671359
##  [43]  0.17507038  0.26260557  0.39390836  0.23634502  0.14180701  0.08508421
##  [49]  0.05105052  0.03063031  0.04594547  0.02756728  0.04135092  0.02481055
##  [55]  0.01488633  0.02232950  0.03349425  0.05024137  0.03014482  0.01808689
##  [61]  0.01085214  0.01627820  0.02441731  0.01465038  0.02197558  0.03296336
##  [67]  0.01977802  0.02966703  0.04450054  0.02670033  0.04005049  0.06007573
##  [73]  0.09011360  0.13517040  0.20275560  0.30413339  0.45620009  0.27372005
##  [79]  0.16423203  0.09853922  0.05912353  0.08868530  0.13302795  0.19954192
##  [85]  0.29931288  0.44896932  0.67345398  0.40407239  0.24244343  0.36366515
##  [91]  0.54549772  0.81824658  1.22736988  0.73642193  0.44185316  0.66277973
##  [97]  0.39766784  0.23860070  0.14316042  0.21474063  0.32211095

So this player above ended with approx. 32 Cents, meaning, he lost almost all of the 10€ he started with.

Many Games

Next, we want to create a data frame that contains the Game results of One_Game for any multiple of players.

Many_Game function

Arguments:

  • players: A numeric (\(geq 1\)) that represents the numbers of players. All of them play the same version of the game, i.e. same Startvalue, rounds, Win-factor, Lose-factor, …
  • ...: All the arguments from the One_Game function with the same default values.

Outcome: A data frame of length rounds+1 that contains the stakes of the players (in #players columns) for all rounds (starting with Startvalue in row 1 and ending with the amount of money they get payed back).

Many_Games <- function(players = 10, rounds = 100, ...){
  
  #Input Checks
  stopifnot(is.numeric(players) & players >= 1)
  
  #Creating an empty data frame of the needed dimensions
  df <- data.frame(matrix(nrow = (rounds+1),ncol = players))
  
  #Filling the data frame column by column with the results of One_Game
  for(i in 1:ncol(df)){
    df[,i] <- One_Game(rounds = rounds, ...)
  }

  return(df)
}
Example with default values:

As we can see on Page 11, all players lost the game. But player 8 lost “only” 5€ which is “only” two heads away from winning (\(5*1.5*1.5 = 11.25 > 10\)).

Many Games Visualized

To get a better understanding of the game we will create a random walk plot that shows us the course of the games of all players. To do so, we write the function Many_Games_plot.

Many_Games_plot function

Arguments: All the arguments from the Many_Games function with the same default values.

Outcome: A plot that shows the course of the game for each player (black lines).

Graphical components:
The y-axis represents the stake of the games. It’s a logarithmic scale because the effects of the coin tosses are multiplicative.
The red line represents the Startvalue. If the end of a black line is above the red line then this specific player made profit.
The blue line represents the Total stakes, i.e. Startvalue*players. If the end of a black line is above the blue line then this specific player made more profit then all the combined stakes. This is an especially important result, because this would mean that the game provider lost money - He will have to pay the winner more money than he collected from all players together.
The pink line represents the expected stake in each round based on the geometric mean sqrt(Win*Lose).

Many_Games_plot <- function(Startvalue = 10, rounds = 100, players = 10, Win = 1.5, Lose = 0.6, ...){
  
df <- Many_Games(Startvalue = Startvalue, rounds = rounds, Win = Win, Lose = Lose, players = players,...)

df$index <- 1:nrow(df)

df_long <- melt(df , id.vars = 'index', variable.name = 'Spieler')

g <- ggplot(df_long, aes(index,value))+ 
      geom_line(aes(group = Spieler),alpha=1/log10(rounds),size=1/log10(players))+
      scale_y_log10()+
      geom_function(fun = function(index) Startvalue*((Win*Lose)^0.5)^(index-1),color="pink",size=2/log10(players))+
      geom_hline(yintercept = Startvalue,color="red",linetype="dashed")+
      geom_hline(yintercept = Startvalue*players,color="blue",linetype="dotdash")+
      labs(title="Sample Coin Toss Game", x= "Round", y="Stake at given round")+
      theme_bw()

    return(g)
}

Examples:

Many_Games_plot()

Many_Games_plot(players = 100,rounds = 500)

As we see, in both examples almost all players lose and as expected most of them do so exponentially fast. But unfortunately for us - the game host - there are players who are so lucky, that they win more money than the total stakes. So although almost all players lost the game, we as well lost money.

After running this game over and over again (with 100 players and 500 rounds), we can see that there are a lot of cases where we outplay all players, but from time to time there is a player that wins an insane amount of money and we would lose more than we won in all the previous games combined. Of course this total lose takes longer to occur the smaller the number of players and the bigger the number of rounds is. It also takes longer the smaller the geometric mean is. The result is also very sensitive to the geometric mean, because the expected value after n rounds - which is determined by the geometric mean - is decreasing exponentially. So a small change in the geometric mean will have a huge long time effect on the results.

Math

Transformation of the random variable

To get a better understanding of the process that leads to this distribution, we will now take a look at the underlying mathematics.

Let \(Z\) be a random variable that represents the amount of heads in the \(n\) coin tosses with probability \(p\) for heads - \(Z \sim B(n,p)\). Let \(s\) be the Startvalue of the game and f_w and f_l the multiplicative factors for winning and losing.
Let \(X_n\) be the a random variable that represents the stake after \(n\) rounds. We can decribe \(X_n\) as follows: \[ X_n = s\cdot f_w^Z \cdot f_l^{n-Z} = s\cdot f_l^n \cdot (\frac{f_w}{f_l})^Z\] By applying some transformations we can see that \(\log(X_n)\) follows a binomial distribution and is threfore, for large enough n, approximately normal distributed.

\[\begin{aligned} & a := \frac{f_w}{f_l} \\ & log_a(X_n) = log_a(s\cdot f_l^n) + Z \\ \Rightarrow & log_a(\frac{X_n}{s\cdot f_l^n}) = Z \sim B(n,p) \end{aligned}\]

Question: Does the factor \(s\cdot f_l^n\) and the Basis \(a\) matter for the conclusion?

Visualizing the transformed random variable

We check the validity of the statement, we can take one result of the Many_Games-function, apply the transformation from above to all the end results and plot them. To do so we quickly write another function that does just that.

Log_Games function:

Arguments: All the arguments from the Many_Games-function with the same default values, except for the default value of players. We set the default value of player to 50 to make sure, that we can apply the central limit theorem by default.

Outcome: A plot that shows the distribution of the transformed end-results of Many_Games. The red line represents the approximated distribution using the kernal density function and the blue line represents the expected approximated normal distribution of \(Z\), i.e. of \(B(n,p)\).

Log_Games <- function(Startvalue = 10, rounds = 100, players = 50, Win = 1.5, Lose = 0.6, head_prob=0.5){
  
  End_res <- Many_Games(players = players,rounds = rounds, Win=Win, Lose=Lose,head_prob=head_prob) %>% mutate(index=1:(rounds+1)) %>% melt(id.vars = 'index', variable.name = 'Spieler') %>% filter(index == rounds)
  
  Z <- logb(End_res$value/(Startvalue*Lose^rounds),base=(Win/Lose))
  
  g <- ggplot(as.data.frame(Z),aes(x=Z))+
    geom_histogram(aes(y=..density..),color="white",alpha = 0.8)+
    geom_density(alpha = 0, color = "red")+
    scale_x_continuous(breaks=round(seq(-3,3,1)*sd(Z)+mean(Z)))+
    stat_function(fun = dnorm, args = list(mean = rounds*head_prob, sd = sqrt(rounds*head_prob*(1-head_prob))),color="blue")+
    annotate("text", x=Inf, y = Inf, label = "\nExpected normal distribution", vjust=1, hjust=1, color = "blue")+
    annotate("text", x=Inf, y = Inf, label = "\n\nKernal density function", vjust=1, hjust=1, color = "red")+
    theme_bw()

  
  return(g)
}

Example for 500 players and 500 rounds:

Log_Games(players = 500,rounds = 500)

Questions: 1. How likly is it to win the game as a player (above red line in Many_Games_plot)?
2. How likly is it to win more than the total stake (above blue line in Many_Games_plot)?

To answer those questions we first calculate the amount of heads needed to win (based on the number of rounds played): Let \(k\) be the number of heads in \(n\) rounds. We first calculate the amount of heads needed to win the game: \[s\cdot f_w^k\cdot f_l^{n-k} > s \Leftrightarrow k > -n\cdot \log_a(f_l) = \frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}}\] So for \(k\) heads with \(k > \frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}}\) the player wins the game.

Now let \(n_p\) be the amount of players. The amount of heads needed to win more than the total stake (based on the number of rounds played and the number of players) \[s\cdot f_w^k\cdot f_l^{n-k} > s\cdot n_p \Leftrightarrow k > -n\cdot \log_a(f_l) + \log_a(n_p) = \frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}} + \frac{\ln(n_p)}{\ln(\frac{f_w}{f_l})}\]

So for \(k\) heads with \(k > \frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}} + \frac{\ln(n_p)}{\ln(\frac{f_w}{f_l})}\) the player wins more than the total stakes.

Indeed we can see that \(k\) is very sensitive to the ratio of the logarithmic factors. The intuition behind the logarithm here is the fact, that \(f_w > 1\) can be infinitly large, but \(f_l<1\) is only between 0 and 1. So to compare them appropriately, we take their respective logarithms. This confirms our assumptions from before, i.e. it takes longer to win the smaller the geometric mean is and the number of heads needed to win is also very sensitive to the geometric mean. The geometric mean increases with higher \(f_w\) and higher \(f_l\) and so does the \(k\) based on the equation above.

Note: Because its hard to see that the last statement is true, here is a clearer explanation: We define \(h_{n,f_w,n_p}:(0,1) \to \mathbb{R}, h_{n,f_w,n_p}(f_l) = \frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}} + \frac{\ln(n_p)}{\ln(\frac{f_w}{f_l})}\). The function \(h_{n,f_w,n_p}\) is a decreasing function in its argument \(f_l\).
We define \(g_{n,f_l,n_p}:(1,\infty) \to \mathbb{R}, g_{n,f_l,n_p}(f_w) = \frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}} + \frac{\ln(n_p)}{\ln(\frac{f_w}{f_l})}\). The function \(h_{n,f_w,n_p}\) is a decreasing function in its argument \(f_w\). Therefore \(k = h_{n,f_w,n_p}(f_l) = g_{n,f_l,n_p}(f_w)\) decreases when \(f_w\) or \(f_l\) increases and vice versa.

Furthermore \(k\) increases with increasing \(n\) and \(n_p\). The amount of other players has a small impact on the numbers of heads needed to get above the blue line, since we only look at the logarithmic number of players. This can yet again be contributed to the multiplicative effect of the wins and loses and the fact that the total stakes only increase linearly by the number of players.

Calculating Probabilities

We want to know \(P(Z \geq K)\). Based on the visualization it’s fair to assume that for enough players \(Z\) is approximately normal distribution, i.e. \(Z \dot\sim N(np, np(1-p))\). Therefore it follows that \[P(Z \geq K) = P(\frac{Z-np}{\sqrt{np(1-p)}}) \geq \frac{K-np}{\sqrt{np(1-p)}}) = 1- \Phi(\frac{K-np}{\sqrt{np(1-p)}})\]

Now we can substitute previously calculated minimum value \(k\) for \(K\):

\[P(\text{Winning}) = P(X_n \geq s) = P\Bigg(Z \geq \frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}}\Bigg) = 1- \Phi\Bigg(\frac{\frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}}-np}{\sqrt{np(1-p)}}\Bigg)\]

We can quickly transform the equation to obtain the probability for winning more than the total stakes. To do so we simply take the other formula for \(k\).

\[P(\text{Winning it all}) = P(X_n \geq s\cdot n_p) = P\Bigg(Z \geq \frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}} + \frac{\ln(n_p)}{\ln(\frac{f_w}{f_l})}\Bigg) = 1- \Phi\Bigg(\frac{\frac{n}{1-\frac{\ln(f_w)}{\ln(f_l)}} + \frac{\ln(n_p)}{\ln(\frac{f_w}{f_l})}-np}{\sqrt{np(1-p)}}\Bigg)\]

We can write this as a function to make better use of it:

Winning <- function(rounds = 100, players = 1, Win = 1.5, Lose = 0.6, head_prob=0.5,...){
  
  zaehler = (rounds/(1-log(Win)/log(Lose)))  + log(players)/log(Win/Lose) - rounds*head_prob
  nenner = sqrt(rounds*head_prob*(1-head_prob))
  
  return(pnorm(zaehler/nenner,lower.tail = FALSE))
  
}

The probability to win a the coin game from the beginning (100 rounds, fair coin, winning factor of 1.5, losing factor of 0.6) is
Winning(rounds=100): 0.125101
and the probability to win it all (10 players) is
Winning(rounds=100, players=10): 0.0492217.

The probability to win a the coin game for 500 rounds, a fair coin, a winning factor of 1.5 and losing factor of 0.6 is:
Winning(rounds=500): 0.0050679
and the probability to win it all with 100 players is
Winning(rounds=500, players=100): 0.001261.

Using the Winning function we can also create some 3D plots to visualize relationships between different variables.

As explained before, the probability for “winning it all” depends only logarithmically on the number of players \(n_p\) and linearly on the number of rounds played \(n\). We can see this in the following plot:

Another interesting relationship is the one between \(f_w\) and \(f_l\):

As expected and explained, the winning probability is very sensitive to \(f_w\) and \(f_l\).

Conclusion

As we can see, the guy from the bar might be doing himself a favor by not offering this game to often, because with increasing an amount of players it becomes very certain that he will lose all his money to a few lucky players. After all, the amount of money that a player can win is very high, but the amount of money a player can lose is just the starting value, i.e. the price of the game. Of course the guy could also choose better values for \(f_w\) and \(f_l\), but it’s not obvious if the game will then still be attractive to players. After all the guy who calculated the average return as \(\frac12 1.5 + \frac 12 0.6 = 1.05\) would not play this game if \(f_l\) is below \(0.5\) or \(2-f_w\) in general - This area \(f_l = 2-f_w\) would be pretty much be the lower boundary of the sensitive area in the plot above. So instead of adjusting the factors, the guy who offered the game should stop lecturing the bar guests and go back to study statistics with Dr. Fabian Scheipl.